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1. INTRODUCTION 

The data warehouse modeling phase is the first and most crucial step in the business intelligence (BI) 
process since it determines the quality of all other steps, including data transformation, data analysis, 
information extraction, and online analytical processing (OLAP). Thus, because the design of a data warehouse 
is frequently described as a multi-tier system in which data from one layer is generated from data from the 
previous layer, establishing a data warehouse is a difficult and complex operation [1]. As a result of this 
architecture, several strategies, and approaches for developing different parts of data warehouses have been 
offered. 

Through this work, the authors use a conceptual model summarizing the analysis and design of the 
information system of the organization. From this model, future objects decision models such as relational and 
OLAP cube models will be automatically generated. The decisional model will be represented as a 
multidimensional schema; indeed, multidimensional modeling is the most commonly used representation of 
data warehouse schemas [2]. Wherefore, a lot of time and effort has gone into data warehouse design, and 
several methods for automated data warehouse modeling have been developed [3]. Although some of these 
methods adopted model driven architecture (MDA), none of them have gained widespread consensus [4]. 

In this paper, the authors propose a new method for designing and implementing data warehouse 
relational and OLAP cubes based on the MDA approach [5]. To do so, the UML class diagram is used to 
present the multidimensional schema as a platform independent model (PIM). Afterwards, from this latter, two 
platform specific models (PSMs) which are relational and OLAP cube are obtained. At last, the SQL code is 
generated from the relational model for building a data warehouse in relational database management system 
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(RDBMS) and the XML code is generated to implement an OLAP cube, which can be integrated in a server 
like Pentaho schema workbench. The model-to-model transformation can be done using query-view- 
transformation (QVT), and Acceleo language is also used to generate code from the models. The advantage of 
these standards is the bidirectional execution of transformation rules. In this respect, the purpose of this research 
is to rethink the work [6], [7] by using MDA concepts to develop the transformation rules aimed at generating 
OLAP cubes and data warehouse relational implementation, and then any user can create a data warehouse 
without the need for detailed knowledge on BI tools. 

This paper is organized as follows: related works are presented in the section 2, the section 3 explains 
our proposed method. A case study is presented in the section 4, and the section 5 focuses on the results and 
possible extensions. The last section concludes this paper and presents some perspectives. 


2. RELATED WORK 

Several research projects have been conducted in recent years to develop data warehouse systems. 
This section presents a brief overview of some approaches that make use MDA viewpoints in the data 
warehouse life cycle. Firstly, by utilizing the multidimensional partitioning approach and a multidimensional 
and spatio-temporal design pattern, which will direct the generation of the data schema, Batoure [8] use model- 
driven engineering to coordinate the overall process of modelling, implementing, and powering a decision 
system. On the other hand, Srai et al. [9] proposed a fact-dimension model. It is a data warehouse graphical 
conceptual model. They also demonstrated how to generate a data warehouse schema from the entity- 
relationship diagram's data sources. 

Several data warehouse case studies are presented in [10]-[13]. The star schema and their variants are 
used in the data warehouse design, which is based on a relational approach employing tables, columns, and 
foreign keys. The authors’ concern was limited to the representation of relational data warehouses, and they 
disregarded any other technologies. Nevertheless, this work is recognized as a standard in the field of data 
warehouses. In contrast, Beggar et al. [14] suggested a method that uses QVT relations to create a 
multidimensional model from user requirements. The generation of the final code and PSM model was not 
explained by the authors, and in a previous study [15] used the MDA approach to create a data warehouse that 
allows them to store temporal structures associated with time-varying dimension levels. The authors identified 
the transformation rules to generate the SQL code to create the associated tables and the historical data 
warehouse. The use of data in the OLAP model was not discussed by the authors. 

What is more, Blanco et al. [16] proposed a method to generate the relational and CWM OLAP models 
using the MDA approach. The two models obtained are independent of specific platforms. Except for the 
relationship transformation of type many to many, the authors defined the transformations rules with the QVT 
language to generate the OLAP model, and the study did not discuss how the cube final code was resolved. 

Last but not least [6], [7] suggested an MDA approach to obtaining an OLAP cube and its 
implementation by automating the transformations process from metamodels. Compared to the mentioned cited 
works, this paper aims to rethink and to complete the work presented in [6], [7], by applying the standard QVT 
and Acceleo to develop the transformation rules aiming at automatically generating the SQL code of data 
warehouse relational model and XML file of OLAP cube. It is the only work for reaching this goal. 


3. PROPOSED METHOD 

The MDA approach opted for this work is the modeling and template approaches to generate the 
relational model of data warehouse and OLAP cube. As mentioned above, these approaches require a source 
and a target metamodels. This section presents the various meta-classes forming the source metamodel of 
conceptual multidimensional and the targets metamodels of data warehouse relational and OLAP cube. 


3.1. Multidimensional source metamodel for data warehouse 

The source metamodel proposed for the transformation, is a simplified metamodel containing facts 
and dimensions. A fact is made up of attributes that represent measures and degenerate dimensions that 
represent the dimensions that surround the fact. The dimension is made up of bases that represent various levels 
of hierarchy. Each base has a descriptor, dimension characteristics, and an ID that acts as the base key. The 
fact_attribute class now includes the attribute aggregation function. This metamodel is an extension of the work 
done by Moukhi et al. [17]. Figure 1 shows the source metamodel. 


Modeling and automatic generation of data warehouse using model-driven ... (Redouane Esbai) 


1868 O ISSN: 2502-4752 


B Package 


B Fact_Attribute 


3 name: EString 
© AggregationFunction : EString 
© DerivationRule : EString 


[1..*] facts 
3 name: EString 


| [1..*] dimensions 


[0..*] mesures 


B Dimension_Attribute 


| B Dimension 


3 name: EString 

© name : EString [2..*] dimensions © derivationRule : EString 

© isTime : EBoolean = false © isOptional : EBoolean = false 
& jsDegenerate : EBoolean = false © type: EString 


--“] dimension_attribute 


[1..1] base 


© name: EString 


[0..1] rolls_Up_to 


[0..1] specialize 


B Descriptor 

& name : EString - 

© derivationRule : EString | [0.1] descriptor 
Bow 

© name : EString 


Figure 1. Simplified conceptual multidimensional source metamodel 
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3.2. The OLAP cube and data warehouse relational target metamodels 

The proposal of this study aims to automatically generate two PSM (the relational model and the 
OLAP model) needed to create respectively the structures of data warehouse and cube. Therefore, to design 
the PSMs, the UML [18] class diagrams metamodels are adopted to present following the OLAP and data 
warehouse relational metamodels as shown in Figure 2 since they are going to be used in this work. The OLAP 
cube metamodel presented in Figure 2(a), is structured into a schema class that owns all elements of an OLAP 
model, i.e., dimensions and cubes. In the OLAP metamodel, each dimension is a collection of hierarchies and 
levels. In addition, cubes are used to store measures and they are related to the dimensions. These latter can 
also contain multiple and diverse hierarchical arrangements including specialized hierarchies that support 
ordering levels by hierarchy levels. The work of Prat et al. [19] contains a more complete version. 

According to Kimball and Ross [20], the most common representation for multidimensional models 
is relational; thus, this work assumes that the second PSM is the relational model. Figure 2(b) illustrates a 
simplified CWM [21] relational metamodel in which a schema is composed of tables with columns and keys. 
Each key is a column which can be a primary or foreign key. Each fact table contains numerous foreign keys 
derived from its dimensions. The fact table also has numerical columns to represent measures and each 
dimension can have columns to represent dimension attributes. For simplicity reasons, this study uses the 
simplified relational metamodel shown in Figure 2(b). 


3.3. Transformation process 

With the multidimensional model as the starting point and the production of the implementation code as 
the finish line, this work's proposal aims to model data warehouses using the MDA approach as shown in Figure 3. 
The authors specify the required metamodels and the main model transformations for each MDA level [22]. The 
authors first created ECORE [23] models for the source and target meta-models. Many metamodels require 
multiple model transformations to be created. From these created metamodels, two transformations techniques 
are needed: model to model (M2M) and model to text (M2T). The M2M transformation approach, by using the 
QVT operational language [24], is utilized to produce two PSM models from one PIM model. Next, the M2T 
transformation is executed with Acceleo language [25] to automatically generate the SQL and XML [26] codes 
required for creating OLAP cubes and relational data warehouses from PSM models. 


Indonesian J Elec Eng & Comp Sci, Vol. 30, No. 3, June 2023: 1866-1874 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 O 1869 


© name : EString 


8 schema 


3 name: EString 


[1..*] schemaelement 


B ModelElement 


3 name: EString 


[1..*] cube 


[Bowe | 

Samer tung |!" dimension Ss 

ee) 3 name: EString 
© type : EString [1..1] primarykey 


{0..1] primarykey 


[1..*] dim B SchemaElement 


© foreignKey : EString 
B Table 
[1..*] measure 


‘as 
© name : EString [1..*] hierarchy 


© datatype : EString B Hierarchy 


= aggregator : EString © name : EString 
= primaryKey : EString 


[1..*] column 


B column 


1..*] column 


= name : EString 
© type : EString 


[1..1] reference [1..1] primaryKeyCol 


{0..1] importedKeyCol 


[0..1] importedkeyOf [0..] primaryKeyOf 


= name: EString 


[1..*] foreignkey 


[0..1] membre 
[1..*] level 


3 name : EString 
© type : EString 
© levelType : EString 


[0.."] foreignkeys: 


8 DimensionTable E FactTable . 
aa) 
(a) (b) 


3 name: EString 


{0..*] membre 


Figure 2. Simplified data warehouse relational (a) OLAP cube target metamodels and (b) a simplified CWM 


f PIM 


| (MD Conceptual) i) 
= : 


( PS [ PSM 
“~e _(O1AP cube) 
M2T (Fe 
{Acceleo) {Acceleo) 


SQL code 


XML code 
Mondrian schemas 
= 
OLAP 
om (exp 


Mysal RDBMS Pentaho Schema Workbench 


Figure 3. The proposed transformation process in MDA approach 


3.3.1. The M2M transformation rules (PIM to PSM) 

The first transformation uses, in entry, a model of multidimensional type, and in output a model of 
OLAP cube type. The first rule of transformation defines the mapping between all the multidimensional 
package's elements and the OLAP schema's elements. The purpose of the second rule is to transform each fact 
into a cube by creating the measures and dimensions for each cube. It is a matter of transforming each 
dimension to an OLAP dimension, which also entails the generation of hierarchies and levels. To accomplish 
this, the authors define a function that creates hierarchies by looping over the associations of type rolls-up-to, 
without forgetting to give the various elements names and types. Figure 4 presents the principal part of the 
M2M transformation rules using the QVT language. 
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The second transformation uses a multidimensional model in the input and generates a ROLAP model 
in the output. However, the authors proposed just below the main rules to obtaining a model of the data 
warehouse relational, from the conceptual model. A summary of these rules is presented in Figure 5. 


[a MD2OLAP.qvto x 


modeltype MDMM uses “http://www.example.org/mdmm"; 

modeltype OLAPMM uses “http://www.example.org/olapmm"; 

transformation MO2OLAP(in mdModel:MOMM, out olapModel:OLAPMM); 
=main() { 

mdModel.objects()[Package]->map MOPackage2SchemaOLAP(); 


} 
= mapping Package: :MDPackage2SchemaOLAP() : Schema { 
name:=self.name; 
cube:=self.facts->map fact2cube(); 


“mapping Fact::fact2cube() : Cube { 
name:=self.name.toLower(); 
measure:=self.mesures->map factAtt2measure(); 
dimension: =self.dimensions->map Dim2Dim(); 


} 


Figure 4. QVT code of M2M transformation from multidimensional to OLAP model 


[@ MD2ROLAP.qvto X 
transformation MD2ROLAP(in mdModel:MDMM, out rolapModel:ROLAPMM) ; 


>main() { 
mdModel.objects()[Package]->map MDPackage2SchemaROLAP() ; 


mapping Package: :MOPackage2SchemaROLAP() : Schema { 
name:=self.name+"_ROLAP"; 
schemaelement:=self.dimensions->map dim2Table(); 
schemaelement+=self.facts->map fact2Table(); 


Figure 5. QVT code of M2M transformation from multidimensional to data warehouse relational model 


3.3.2. The M2T transformation rules 

Unlike M2M, Acceleo language is used to implement M2T transformations. The transformation rules 
towards the SQL and XML codes are generated to respectively build the data warehouse and its equivalent 
OLAP cubes as shown in Figure 6. First, the database is created via the SQL code generator. The subsequent 
step is the generation of the different dimension’s tables correspondingly to the relational model. Finally, a fact 
table is created that contains the foreign keys referencing primary keys in dimensions tables already generated. 
The SQL code to create all the tables corresponds to a “Create Table”. Figure 6(a) bellow presents the 
transformation rules with Acceleo to generate an SQL file. 

Then, the process of creating an OLAP cube still depends on the platform used and requires a thorough 
understanding of its tools. In fact, the multidimensional structures are defined by the OLAP model. Therefore, the 
proposed approach aims to automatically generate OLAP cube from the code. The most popular BI platforms were 
compared, and the results showed that each platform had different cube code extensions and templates. In this paper, 
the authors explain the automatic generation of XML code for the Mondrian schemas. A Mondrian schema is a 
structure of a multidimensional database defined by an XML file and Pentaho schema workbench platform can be 
used to create Mondrian schemas. It's interesting to note that the relational data warehouse must already exist to 
generate the OLAP cube. Figure 6(b) presents the transformation rules with Acceleo to generate an XML file. 


Ee eels (3) OLAP2XML.mtl x 
[comment encoding = UTF-8 /] 


ROLAP2: x ://www. . z 4 
Leomi re ROLAPZAIEL NETpt/ Leet, exempta .crg/rolepmn.)4 [module OLAP2xML("ht www. example.org/olapmn' )] 


= [template public OLAP2XML(aSchema : Schema)] 


=[template public ROLAP2SQL(aSchema : Schema)] [comment @nain/] 


comment @main/ } 
t Gm: ] oncat(’.xml"), false, 


E -name/]"> 
[for (cube : Cube | aSchema.cube)] 
[cube2xmt(cube)/] 


[file (aSchema.name.concat('.sql'), false, 'UTF-8')] 
CREATE DATABASE [aSchema.name/]; 

USE [aSchema.name/]; 

[for (dim : DimensionTable | aSchema.schemaelement)] Die. 
[dim2sQu(dim)/] [/File] 


(/for] [te 
: emplate] 
[for (fact : FactTable | aSchema.schemaelement)] Tinplata puede cokisn Scuba. WRIT 


([fact2sql(fact)/] <Cube name="[cube.name/]" visible="true” cache="true" enabled="true"> 
(/for] <Table name="[cube.name/]"> 
[/file] </Table 
[/template] 7 7 . . [for (dim : nsion | cube.dimension)] 
[template public dim2SQL(dim : DimensionTable)] apes /] 
CREATE TABLE [dim.name/] ( 
[for (c : Column | dim.column) separator (',\n') ] [c.name/] [c.type/][/for], [for (measure : Measure | cube.measure)] 
PRIMARY KEY ([dim.primarykey.name/]) [meassure2XML(measure)/] 
3 [/for] 
</Cube> 
[/template] [/template] 


(a) (b) 
Figure 6. M2T transformation with Acceleo to generate a (a) SQL and (b) XML code 
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4. CASE STUDY 

To validate the proposed approach, several tests were conducted. For example, the authors considered 
the UML class diagram model representing a business, among its activities, the sale of products to its customers; 
these products are classified under categories, and several subcategories form a category. The UML model 
instance of the company information system is represented in Ecore file. This file is composed of the element 
package composed by a fact SalesOrderDetail which contains measures of sales like sum of sales and quantities. 
The package also contains four dimensions, Customer, Product, ProductCategory, and ProductSubCategory that 
each contains attributes which characterize them. Figure 7 presents the model instance of class diagram. This 
model is provided as input for the QVT transformation. As output of this transformation, a file in XMI format is 
automatically generated. It represents the OLAP cube and the company data warehouse. 


+ customerlD: Integer [1] 
+ customerName: String [1] 


+ customerType: String [1] 


+ productiD: Integer [1] 

+ productName: String [1] — 

+ standardCost: Double [1] = SalesOrderDetail 

+ listPrice : Double [1] & + salesOrderDetaillD: Integer [1] 
+ style: String [1] < & + orderQty: Double [1] 

+ color: String [1] ©& sales; Double [1] 
+ dayToMenufacture; Integer [1] 


E ProductCategory 
G+ productCategorylD: Integer [1] 
& + productCategoryName: String [1] 


Haomumuoaga 


2 ProductSubcategory 
& + productSubcategory!D: Integer [1] 
© + productSubcategoryName: String [1] 


Figure 7. UML class diagram source model 


5. RESULTS AND DISCUSSION 

The transformation process is implemented using Eclipse platform. After defining the source and 
target metamodels (Ecore files), then the transformation rules are written to automatically generate the 
relational and OLAP models from the multidimensional one. The objective consists in applying the MDA 
approach to develop a data warehouse, in fact, there are different kinds of model transformations are defined 
in MDA, in this case, the authors are interested in transformations M2M (PIM to PSM) using QVT language 
and M2T (PSM to Code) using Acceleo language, in the first case a source UML model was transformed into 
two targets: OLAP cube and data warehouse relational model. Figure 8 shows the results after applying the 
M?2M transformation rules. The first PSM result as shown in Figure 8(a) is an OLAP schema that is composed 
of a cube table named SalesOrderDetail surrounded by dimensions: Customer, Product, Category, 
SubCategory. The second PSM result as shown in Figure 8(b) is a relational that is composed of a table fact 
named SalesOrderDetail surrounded by other tables dimensions: Customer, Product, Category, SubCategory. 

The next step is to define the transformation rules M2T using Acceleo to generate the SQL and XML 
code. This transformation uses as input the OLAP and data warehouse relational resulting models. In practice, 
writing the M2T transformation presents no difficulties; it simply entails creating a text file containing the 
transformation rules. Figures 9 and 10 illustrate the results after applying the M2T transformation rules. The 
first application generates SQL code for a sales management database as shown in Figures 9 and 11 shows the 
resulting OLAP cube after executing the obtained XML file on Pentaho schema workbench and processing it. 
The second application generates an XML file of Mondrian schema from an OLAP cube. 
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PS) OLAP_SALES.OLAPMM x ® ROLAP_SALES.ROLAPMM x . 
- [% Resource Set 
[& Resource Set ~ B platform:/resource/MD2ROLAP/ROLAP_SALES.ROLAPMM 
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~ & platform:/resource/MD2OLAP/OLAP_SALES.OLAPMM <4 (Slnaiencicmn eile Citcnmas 
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+ Foreign Key productCategorylD 
+ Level productSubcategoryName + Foreign Key productSubcategoryID 
Selection Parent| List| Tree | Table | Tree with Columns) Selection | Parent] List| Tree| Table| Tree with Columns 


(a) (b) 


Figure 8. PSM results (a) is an OLAP schema that is composed of a cube table named SalesOrderDetail and 
(b) relational that is composed of a Table Fact named SalesOrderDetail 


CREATE DATABASE Sales Managment_ROLAP; 
USE Sales _Managment_ROLAP; 


CREATE TABLE Customer ( 
customerID Integer, customerName VARCHAR(30), customerType VARCHAR(30), PRIMARY KEY (customerID) 
; 
CREATE TABLE Product ( 
productID Integer, productName VARCHAR(30), standardCost NUMERIC, listPrice NUMERIC, 
style VARCHAR(30), color VARCHAR(30), dayToManufacture NUMERIC, PRIMARY KEY (productID) 
; 
CREATE TABLE ProductCategory ( 
productCategoryID Integer, productCategoryName VARCHAR(30), PRIMARY KEY (productCategoryID) 
yy 
CREATE TABLE ProductSubcategory ( 
productSubcategoryID Integer, productSubcategoryName VARCHAR(30), PRIMARY KEY (productSubcategoryID) 
Ve 
CREATE TABLE SalesOrderDetail ( 
PK_SalesOrderDetail Integer, salesOrderDetailID Double, orderQty Double, sales Double, customerID Integer, 
productID Integer, productCategoryID Integer, productSubcategoryID Integer, PRIMARY KEY (PK_SalesOrderDetail), 
FOREIGN KEY (customerID) REFERENCES Customer (customerID), 
FOREIGN KEY (productID) REFERENCES Product (productID) , 
FOREIGN KEY (productCategoryID) REFERENCES ProductCategory(productCategoryID), 
FOREIGN KEY (productSubcategoryID) REFERENCES ProductSubcategory (productSubcategoryID) 
te 


Figure 9. SQL file generated for building data warehouse 


Sales Managment” > 
¥<Cube name="salesorderdetail" visible="true" cache="true" enabled="true"> 
<Table name="salesorderdetail"> </Table> 
> <Dimension type="StandardDimension" visible="true" foreignKey="customerID" highCardinality="false" name="customer"> 


</Dimension> 
»<Dimension type="StandardDimension" visible="true" foreignKey="productID" highCardinality="false" name="product"> 


</Dimension> 


><Dimension type="StandardDimension" visible="true" foreignKey="productCategoryID" highCardinality="false" name="productcategory"> 


</Dimension> 
»<Dimension type="StandardDimension" visible="true" foreignKey="productSubcategoryID" highCardinality="false" name="productsubcategory"> 
</Dimension> 
<Measure name="orderQty" column="orderQty" datatype="Numeric" aggregator="sum" visible="true"> </Measure> 
<Measure name="sales" column="sales" datatype="Numeric" aggregator="sum" visible="true"> </Measure> 
</Cube> 


Figure 10. XML file generated of OLAP cube 
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Figure 11. The generated OLAP cube on Pentaho schema workbench 


6. CONCLUSION 

In this paper, an MDA approach is proposed to transform a UML class diagram representing a 
multidimensional conceptual as a PIM model into a data warehouse repository and its OLAP cube as PSMs models. 
The main advantage of integrating MDA to develop a data warehouse is improving productivity and decreasing 
implementation cost. On the other hand, the metamodels used in the process of data warehouse development, are 
proposed just for PIM and PSM. The transformations rules were developed using QVT to transform the class 
diagram into data warehouse relational and OLAP cube models, and then the SQL and XML codes were generated 
in order using Acceleo to create the multidimensional structures of data warehouses and OLAP cubes. 

The main contribution of this work is to provide the required metamodels and thorough transformation 
rules to obtain the relational data warehouse and OLAP cube from the conceptual model. After generating the 
relational and OLAP models, the cube implementation code is automatically created. In addition, this paper 
aims to rethink and to complete the studies presented in the work by using the standards QVT and Acceleo to 
write transformation rules aimed at automatically generating SQL code to create a data warehouse relational 
on a MySQL database and an XML Mondrian code for the Pentaho schema workbench platform, where the 
deployment of the cube that was automatically implemented. 

For the future works, this study can be extended to include the CIM metamodel as the first level of MDA 
to design the user requirements and complete the transformation engine, from data warehouse requirements 
analysis to code generation and deployment of OLAP cubes and data warehouse relational. Furthermore, this 
work aims to develop a tool to gather the whole elements of the approach and be able to use it efficiently. 
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